Approximate Nearest Neighbor Search for Low 

Dimensional Queries* 

Sariel Har-Peled^ Nirman Kumar-'- 

February 7, 2012 

o 

^ Abstract 



Ph 



O 
O 



We study the Approximate Nearest Neighbor problem for metric spaces -where the 



^^ query points are constrained to he on a subspace of lo-w doubhng dimension. 

^ 1 Introduction 

, O, The nearest neighbor problem is the folio-wing. Given a set P of n data points in a metric 

space X, preprocess P, such that given a query point qE X, one can find (quickly) the point 
^ Hq G P closest to q. Nearest neighbor search is a fundamental task used in numerous domains 

^ including machine learning, clustering, document retrieval, databases, statistics, and many 

^ others. 

rn 

t^ Exact nearest neighbor. The problem has a naive linear time algorithm -without any 

preprocessing. Ho-wever, by doing some nontrivial preprocessing, one can achieve a sublinear 
search time for the nearest neighbor. In (i-dimensional Euclidean space (i.e., IR ) this can 
be done by using Voronoi diagrams |dBCvKO08] . However, this approach is only suitable 

*K> for low dimensions as the complexity of the Voronoi diagram is 0(nl^'^/^^). Specifically, 

;h Clarkson |Cla88] showed a data-structure with query time O(logn) time, and 0(n^'^/^^^^^ 

space, where 5 > is a prespecified constant (the O(-) notation here hides constants that 
are exponential in the dimension). One can tradeoff the space used and the query time 
|AM93] . Meiser |Mei93j provided a data-structure with query time 0{d^\ogn) (which has 
polynomial dependency on the dimension), where the space used is O (n'^"'"'') . These solutions 
are impractical even for data-sets of moderate size if the dimension is larger than two. 
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Approximate nearest neighbor. In typical applications, however, it is usually sufficient 
to return an approximate nearest neighbor (ANN). Given an e > 0, a (1 + £:)-ANN, to 
a query point q, is a point y E P, such that 

d{q,y) < (l + e)d(q, nq), 

where Hq G P is the nearest neighbor to q in P. Considerable amount of work was done on 
this problem, see |Cla06] and references therein. 

In high dimensional Euclidean space, Indyk and Motwani showed that ANN can be re- 
duced to a small number of near neighbor queries |IM98j . Next, using locality sensitive hash- 
ing they provide a data-structure that answers ANN queries in time (roughly) 0(n^^^^~^'^^) 

and preprocessing time and space 0(n^+^/*^^+^)). This was improved to 0(n^^^^^^^ j query 

time, and preprocessing time and space Oin^'^^^^^^^^ j |AI06l Rl08j . These bounds are near 

optimal |MNP06j . 

In low dimensions (i.e., IR ), one can use linear space (independent of e) and get ANN 
query time 0(logn -|- l/e'^~^) |AMN"'"98[ IHarlOj . Interestingly, for this data-structure, the 
approximation parameter e is not prespecified during the construction; one needs to provide 
it only during the query. An alternative approach, is to use Approximate Voronoi Diagrams 
(AVD), introduced by Har-Peled [HarOlj . which are partition of space into regions, desirably 
of low complexity, typically with a representative point for each region that is an ANN for 
any point in the region. In particular, Har-Peled showed that there is such a decomposition 
of size 0(^{n/e'^) log n), such that ANN queries can be answered in O(logn) time. Arya and 
Malamatos |AM02] showed how to build AVDs of linear complexity (i.e., 0{n/e'^)). Their 
construction uses Well Separated Pair Decompositions |CK95] . Further tradeoffs between 
query and space for AVDs were studied by Arya et al. |AMM09] . 

Metric spaces. One possible approach for the more general case, when the data lies in 
some abstract metric space, is to define a notion of dimension and develop efficient algorithms 
in these settings. This approach is motivated by the belief that real world data is "low 
dimensional" in many cases, and should be easier to handle than true high dimensional data. 
An example of this approach is the notion of doubling dimension [Ass83| IHeiOH IGKL03] . 
The doubling constant of metric space X is the maximum, over all balls b in the metric 
space X, of the minimum number of balls needed to cover b, using balls with half the radius 
of b. The logarithm of the doubling constant is the doubling dimension of the space. The 
doubling dimension can be thought of as a generalization of the Euclidean dimension, as IR 
has Q{d) doubling dimension. Furthermore, the doubling dimension extends the notion of 
growth restricted metrics of Karger and Ruhl |KR02] . 

The problem of AN N in spaces of low doubling dimension was studied in |KR02t IHKMR04] . 
Talwar |Tal04j presented several algorithms for spaces of low doubling dimension. Some of 
them were however dependent on the spread of the point set. Krauthgamer and Lee |KL04] 
presented a net navigation algorithm for ANN in spaces of low doubling dimension. Har- 
Peled and Mendel |HM06j provided data-structures for ANN search that use linear space and 



match the bounds known for H |AMN"'"98j . Clarkson |Cla06] presents several algorithms 
for nearest neighbor search in low dimensional spaces for various notions of dimensions. 

ANN in high and low dimensions. As indicated above, the ANN problem is easy 
in low dimensions (either Euclidean or bounded doubling dimension). If the dimension is 
high the problem is considerably more challenging. There is considerable work on ANN in 
high dimensional Euclidean space (see |IM98t IKOROOj ) but the query time is only slightly 
sublinear if e is close to 0. In general metric spaces, it is easy to argue that (in the worst case) 
the ANN algorithm must compute the distance of the query point to all the input points. 

It is natural to ask therefore what happens when the data (or the queries) come from 
a low dimensional subspace that lies inside a high dimensional ambient space. Such cases 
are interesting, as it is widely believed that in practice, real world data usually lies on a 
low dimensional manifold (or is close to lying on such manifold). Such low-dimensionality 
arises from the way the data is being acquired, inherent dependency between parameters, 
aggregation of data that leads to concentration of mass phenomena, etc. 

Indyk and Naor |IN07j showed that if the data is in high dimensional Euclidean space, 
but lies on a manifold with low doubling dimension, then one can do a dimension reduction 
into constant dimension (i.e., similar in spirit to the JL lemma |JL84] ). such that (1 + e)- 
ANN to a query point (the query point might lie anywhere in the ambient space) is preserved 
with constant probability. Using an appropriate data-structure on the embedded space and 
repeating this process sufficient number of times, results in a data-structure that can answer 
such ANN queries in polylog time (ignoring the dependency on e). 

The problem. In this paper, we study the "reverse" problem. Here we are given a high 
dimensional data set P, and we would like to preprocess it for ANN queries, where the queries 
come from a low- dimensional subspace/manifold M.. The question arises naturally when the 
given data is formed by a large number of data sets, while the ANN queries come from a 
single data set. 

In particular, the meta question here is whether this problem is low or high dimensional in 
nature. Note, direct dimension reduction as done by Indyk and Naor would not work in this 
case. Indeed, imagine the data lies densely on a slightly deformed sphere in high dimensions, 
and the query is the center of the sphere. Clearly, a random dimension reduction into 
constant dimension would not preserve the (1 -|- £:)-ANN (with high probability). 

Our results. Given a point set P in a general metric space X (which is not necessarily 
Euclidean and is conceptually high dimensional), and a subspace A^ having low doubling 
dimension, we show how to preprocess P such that given any query point in M. we can 
quickly answer (1 + e)-ANN queries on P. In particular, we get data-structures of (roughly) 
linear size that answer (1 + e)-ANN queries in (roughly) logarithmic time. 

Our construction uses ideas developed for handling the low dimensional case. Initially, we 
embed P and M. into a space with low doubling dimension that (roughly) preserves distances 
between M. and P. We can use the embedded space to answer constant factor ANN queries. 



Getting a better approximation requires some further ideas. In particular, we build a data- 
structure over Ai that is remotely similar to Approximate Voronoi Diagrams |Har01] . By 
sprinkling points carefully on the subspace A^ and using the net-tree data-structure |HM06] 
we can answer (1 + £)-ANN queries in time 0{e~'^^'^^^^ + 2*^^'^™) logn). 

To get a better query time requires some further work. In particular, we borrow ideas from 
the simplified construction of Arya and Malamatos |AM02j (see also |AMM09] ) . Naively, 
this requires us to use well separated pairs decomposition (i.e., WSPD) [CK95J for P. Unfor- 
tunately, no such small WSPD exists for data in high dimensions. To overcome this problem, 
we build the WSPD in the embedded space. Next, we use this to guide us in the construction 
of the ANN data-structure. This results in a data-structure that can answer (1 -|- £)-ANN 
queries in 0(2'^'-'^™Mogn) time. See Section p^ for details. 

We also present an algorithm for a weaker model, where the query subspace is not given 
to us directly. Instead, every time an ANN query is issued, the algorithm computes a region 
around the query point such that the returned point is a valid ANN for all the points in the 
region. Furthermore, the algorithm caches such regions, and whenever a query arrives it first 
checks if the query point is already contained in one of the regions computed, and if so it 
answers the ANN query immediately. Significantly, for this algorithm we need no prespecified 
knowledge about the query subspace. The resulting algorithm computes on the fly AVD on 
the query subspace. In particular, we show that if the queries come from a subspace with 
doubling dimension dim then the algorithm would create at most n/e*^*-^™-' regions overall. A 
restriction of this new algorithm is that we do not currently know how to efficiently perform 
a point-location query in a set of such regions, without assuming further knowledge about 
the subspace. Interestingly, the new algorithm can be interpreted as learning the underlying 
subspace/manifold the queries come from. See Section [6] for the precise result. 



Organization. In Section [2| we define some basic concepts, and as a warm-up exercise 
study the problem where the subspace A^ is a linear subspace of IR'' - this provides us 
with some intuition for the general case. We also present the embedding of P and Ai into 
the subspace A^', which has low doubling dimension while (roughly) preserving distances 
of interest. In Section |3| we provide a data-structure for constant factor ANN using this 



embedding. In Section 
(l + £:)-ANN. In Section 



4| we use the constant ANN to get a data-structure for answering 
5, we use WSPD to build a data-structure that is similar in spirit to 
AVDs. This results in a data-structure with slightly faster ANN query time. The on the fiy 
construction of AVD to answer ANN queries without assuming any knowledge of the query 
subspace is described in Section [6j Finally, conclusions are provided in Section [7} 

2 Preliminaries 

The Problem. We look at the ANN problem in the following setting. Given a set P of 
n data points in a metric space X, and a set A^ C A" of (hopefully low) doubling dimension 
dim, and £ > 0, we want to preprocess the points of P, such that given a query point q G Al 
one can efficiently find a (1 + £)-ANN of q in P. 



Model. We are given a metric space X and a subset Ai C X oi doubling dimension dim. 
We assume that the distance between any pair of points can be computed in constant time 
in a black-box fashion. We also assume that one can build nets on Ai. Specifically, given a 
point p E Ai and a radius r > 0, we assume we can compute 2^^'™ points pi E Ai, such that 
ball(p, r) n A^ C IJ ball(j9i, r/2). By applying this recursively we can compute a r-net N for 
any ha\\{p,R) centered at p; that is, for any point s G ha\\{p,R) there exists a point u E N 
such that d{s, u) < r. Let compNet(p, R, r) denote this algorithm for computing this r-net. 
The size of A^ is (i?/r)*^*^'^™\ and we assume this also bounds the time it takes to compute 
it. 

Finally, given any point p E X we assume that one can compute, in 0(1) time, a point 
a{p) E At such that a{p) is the closest point in At to p. (Alternatively, a{p) might be 
specified for each point of P in advance.) 

Well separated pairs decomposition. For a point set P, a pair decomposition of 

P is a set of pairs W = < {Ai, Bi} , . . . , {Ag, Bs} >, such that (I) Ai, Bi C P for every i, 

(11) Ai n 5, = for every i, and (111) U^^^A^ ® 5^ = P O P. 

A pair Q C P and R C P is [l/e)- separated if max(diam(Q) ,diam(R)) < e ■ d(Q, R), 
where (i(Q, R) = miUpgQ seRC?(p, s). For a point set P, a well- separated pair decompo- 
sition (WSPD) of P with parameter l/e is a pair decomposition of P with a set of pairs 
yV = {{Ai, Bi} , . . . , {As, Bg}}, such that, for any i, the sets Ai and Bi are e~^-separated 
[CK95]. 

2.1 Warm-up exercise: Affine Subspace. 

We first consider the case where our query subspace is an affine subspace embedded in d 
dimensional Euclidean space. Thus let A* = IR with the usual Euclidean metric. Suppose 
our query subspace A^ is an affine subspace of dimension k where k <^ d. We are also given 
n data points P = {pi,P2, ■ ■ ■ ,Pn}- We want to preprocess P such that given a q G A^ we 
can quickly find a point Pi E P which is a (1 + £:)-ANN of q in P. 

We choose an orthonormal system of coordinates for A^. Denote the projection of a point 
p to AI as a{p). Denote the coordinates of a point a{p) E Ai in the chosen coordinate system 
as (p^,p^, . . . ,p''). Let h(p) denote the distance of a p G IR"^ from the subspace AI. Notice 
that h(p) = \\p — a{p)\\. Consider the following embedding p' = {p^,p'^, . . . ,p^, h(p)) ^ H'^^^ 

It is easy to see that for x G AI and y E IR'', ||a; — y|| = ||a; — a(y)|| + ||tt(y) — y|| = 
\\x - a{y)f + \\{yf = \\x' - y'f. So, ||a; - y\\ = \\x' - y'\\. 

As such, if we can find a (1 + £:)-ANN p- of q' in 1R''+^ then pi is a (1 + £:)-ANN of q. But 
this is easy to do using known data-structures for ANN jAMN+QS] . or the data-structures 
for approximate Voronoi diagram [HarOlt IAM02] . 

Thus, we have n points in IR ^^ to preprocess and without loss of generality we can 
assume that Pi are all distinct. Now given e < 1/2, we can preprocess the points {p[, . . . ,p'„} 
and construct an approximate Voronoi diagram consisting of 0(^n€~^''~^^Hoge~^) regions. 



Each such region is the difference of two cubes. Given a point q' G IR ^^ we can find a 
(1 + £:)-ANN in time 0{\og{n/e)). 

2.2 An Embedding. 

We show how to embed the points of P (and in fact all of X) into another metric space 
A4' with finite doubling dimension, such that the distances between P and M are roughly 
preserved. 

For a point p G A", let a{p) denote the closest point in A^ to p (for the sake of simplicity 
of exposition we assume this point is unique). The height of a point p & X is the distance 
between p and a{p); namely, h{p) = d{p, a{p)). Generalizing this, for a given set A C A", we 

will let a{A) denote the set < a{x) x E A>. 

The metric space A^' is A^ x IR"*". The embedding yj : A* — ?■ A^' maps a point p E X into 
the point ip{p) = {a{p), h(p)). For a point p E X, we use p' = ip{p) to denote the embedded 
point. The distance between any two points p' = {a{p), h(p)) and s' = (a(s), h(s)) of Al' is 
defined as 

dM'ip',s') = dM'{(y{p),Oi{s)) + \h{p) - h(s)|. 

It is easy to verify that (ix/(-,-) complies with the triangle inequality. For the sake of 
simplicity of exposition, we assume that for any two distinct points p and s in our (finite) 
input point set P it holds that pi ^ s' (that is, dM'{p', s') 7^ 0). This can be easily guaranteed 
by introducing symbolic perturbations. 

Lemma 2.1 The following holds: (A) For any two points x,y E M., we have dj^^x' ,y') = 
dx{x,y). (B) For any point x E M. and y E X , we have dx{x,y) < dM'{x',y') < 3dx{x,y). 
(C) The metric space Al' has doubling dimension at most 2 dim +2. 

Proof: (A) Clearly, for x, y G Al, we have x' = {x, 0) and y' = {y, 0). As such, dM'{x', y') = 
dx{x,y) + |0 - 0| = dx{x,y). 

(B) Let a; G Al and y E X . We have x' = (x, 0) and y' = {a{y), dx{y, a{y))). As such, 

dM'ix',y') = dxia{x),a{y)) + \0-h{y)\ 
= dxix,a{y)) + dxia{y),y) 
> dx{x,y), 

by the triangle inequality. On the other hand because dx{y, o:{y)) < dx{y, x), 

dM'{.x\y') = dxix,a{y)) + dxiy,a{y)) 

< dxix,y) + 2dxiy,a{y)) 

< 3dx{x,y), 

(C) Let (p, a) be a point in Al' and consider the ball b = ballx/((p, a), r) C Al' of radius 
r with center (p, a). Consider the projection of b into Al; that is Pm = \^ {s,h) E h >. 

Similarly, let Pjn = \h {s,h) E h >. 



Clearly, ballx/((p, a),r) C P^j x P^, and Pm is contained in the ball ball_A4(p, t) = 
ballA'(p, r)nAi. Since the doubling dimension of Ai is dim, this ball can be covered by 2^'^'™ 
balls ballx(pi,r/4) with centers pi G Ai. 

Also since Pjb. C ]R is contained in an interval of length at most r, it can be covered by at 
most 4 intervals /i, I2, I3, h of length r/4 each, centered at values Xi, 0:2, 0:3, 0:4, respectively. 
Then, 

ha\\M'i{p,a),r) C Pm x Pr 

4 
C |J|J(ball^(p„r/4)nA^)x/, 

4 

C |J|JballA,,(fe,x,),r/2), 

since the set ball_yv((pj,'r/4) x Ij is contained in ball^'((pj,Xj),r/2). We conclude that 
ha\\M'{{p, c-), "t) can be covered using at most 2^*^'™+^ balls of half the radius. ■ 

3 A Constant Factor ANN Algorithm 



In the preprocessing stage, we map the points of P into the metric space A^' of Lemma 2.1 

Build a net-tree for the point set P' = <p' p G P [ in A^' and preprocess it for ANN queries 

using the data-structure of Har-Peled and Mendel |HM06] . Let V denote the resulting data- 
structure. 

Answering a query. Given q G A4, we compute a 2-ANN to q' G Ai'. Let this be the 
point y'. Return d{q,y). 

Correctness. Let nq be the nearest neighbor of q in P and y the point returned. We 
have, 

dM'{c{,f^'^) = c?A'(q,a(nq)) + h(nq) 

< c/;t(q, Hq) + h(nq) + h(nq) 

< 3(iA'(q, Hq) 

As y' is a 2-ANN for q' and q G A^, we have 

dx{q,y) < dM'ii^y') < 2rfA4'(q', n'J < 6rf(q, Hq) . 
We thus proved the following. 

Lemma 3.1 Given a set P C A* of n points and a suhspace Ai of doubling dimension dim, 
one can build a data- structure in 2'^*^'^™)nlogn expected time, such that given a query point 
q G A^, one can return a 6-ANN to q in P in 2'^^'^™) logn query time. The space used by 
this data- structure is 2^^'^^'^^n. 



Proof: Since the doubling dimension of Ai' is at most 2 dim +2, building the net tree and 
preprocessing it for ANN queries takes 2*^^'^™''nlog?7, expected time, and the space used is 
20(dim)^ |HMn6j . The 2- ANN query for a point q takes time 2^^'^'°^) logn. ■ 

4 Answering (1 + e)-ANN 

Once we have a constant factor approximation to the nearest-neighbor in P it is not too hard 
to boost it into (1 + £:)-ANN. To this end, we need to understand what the net-tree |HM06] 
provides us with. The following is implied by fiddling with the ANN algorithm of |HM06j . 

Lemma 4.1 Given a net-tree for a set Q C A^ of n points in a metric space with doubling 
dimension dim, and given a point p E M. and radii r < R, one can compute a r-net N of 
Q, such that the following properties hold: 

(A) For any point s G Q fl ball(p, i?) there exists a point u E N such that d{s, u) < r. 

(B) \N\ = (/?/r)°('i'"^). 

(C) Each point of p E N corresponds to a node v{p) in the net-tree. Let Qt,(p) denote 
the subset of points of Q stored in the subtree of v{p). The union UpGAf ^^(p) (covers 
Qnha\\{p,R). 

(D) For any p E N, the diameter of the point set Qv(p) is bounded by r. 

(E) The time to compute N is 2'^^^^'^'^ \ogn + 0{\N\). 

Construction. For every point p G P we compute a r(p)-net U{p) for ballx(a(p), i?(p)), 
where r{p) = eh{p) /(20ci) and R{p) = cih(p) /e. Here ci is some sufficiently large constant. 
This net is computed using the algorithm compNet, see Section M This takes l/£:'^(<^™) 
time to compute for each point of P. 

For each point u of the net U{p) C JH store the original point p it arises from, and the 
distance to the original point p. We will refer to s{u) = d{u,p) as the reach of u. 

Let Q C A^ be union of all these nets. Clearly, we have that |Q| = n/e'~'^^^^\ Build a 
net-tree 7 for the points of Q. We compute in a bottom-up fashion for each node v of the 
net-tree T the point with the smallest reach stored in Q„. 

Answering a query. Given a query point q G A^, compute using the algorithm of 



Lemma 3.1 a 6-ANN to q in P. Let A be the distance from q to this ANN. Let R = 20A, 



and r' = eA/20. Using T and Lemma 4.1, compute a r'-net A^ of ball_A4(q, -R). 

Next, for each point oi p E N consider its corresponding node v{p) G 7. Each such 
node stores a point of minimum reach in Q^(p). We compute the distance to each such 
minimum- reach point and return the nearest-neighbor found as the ANN. 

Theorem 4.2 Given a set P (1 X of n points and a subspace Ai of doubling dimension 
dim, and a parameter e > 0, one can build a data- structure in ne"'^^^''™^ logn expected 
time, such that given a query point q G A^, one can return a (1 + e)-ANN to q in P in 
20(dim)|Qg^^^-o(dim) query time. 



This data- structure uses ne ^i'^im) 



space. 



Proof: We only need to prove the bound on the quahty of the approximation. Consider the 
nearest-neighbor nq to q in P. 

(A) If there is a point z G f/(nq) C Q in distance at most r' from q then there is a net point 
u oi N that contains z in its subtree of T. Let Wy be the point of minimum reach in 
Qv{u), and let y G P be the corresponding original point. Now, we have 

d{q, y) < c?(q, Wy) + d{wy, y) < rf(q, Wy) + d{z, n^ 

as the point Wy has reach d{'Wy, y), Wy is the point of minimal reach among all the points 

of Qv{u), z G Qv{u), and d{z, Hq) is the reach of z. So, by the triangle inequality, we have 

diq, y) < rf(q, Wy) + rf(q, Hq) + d{z, q) 

< ((i(q, z) + d{z, Wy)) + rf(q, Hq) + d{z, q) 

< (i(q, Hq) + 3r', 

as -2,ti;j; G Qv(u) and the diameter of Qv(u) is at most r'. So we have, 
rf(q,i/) < rf(q, nq) + 3£A/20 < (1 + e)d{q, Hq) . 

(B) Otherwise, it must be that, (i(q, f/(nq)) > r'. Observe, that it must be that r(nq) < r' 
as h(nq) < A. It must be therefore that the query point is outside the region covered 
by the net U{n^. As such, we have 



i?(nc 



cMn^ 



e 

< c?(a(nq),q) 

< (i(q, Hq) + (i(nq,a(nq)) 

< 2rf(nq, q) < 2A, 

which means h(nq) < let^j c\. Namely, the height of the point nq is insignificant in 
comparison to its distance from q (and conceptually can be considered to be zero). In 
particular, consider the net point u & N that contains Q;(nq) in its subtree. The point of 
smallest reach in this subtree provides an (1 + £)-ANN as an easy but tedious argument 
similar to the one above shows. ■ 



5 Answering (l + e)-ANN faster 



In this section, we extend the approach used in the above construction to get a data-structure 
which is similar in spirit to an AVD of P on A^. Specifically, we spread a set of points C on 
Ai, and we associate a point of P with each one of them. Now, answering 2-ANN on C, and 
returning the point of P associated with this point, results in the desired (1 -|- e)-ANN. 

9 



algBmldANN(P,7W). 


P' = |x' 


xeP} 


Compute a 8-WSPD W = {{A[, B[} , . . . , {A'^, B'^}} of P' 
for {A[, B[} G W do 

Choose points a- G A[ and 6- G -B-. 

ti = dM'{a'i, b'i), Ti=U + KU^d + KUB'i) 

Ri = C2Ti/e, Ti = eTi/c2 

Ni = compNet(a(aj), Ri, r^) U compNet(a(6i), i?j, Tj). 


C = N^U 


...UN, 


Mc ^ Net-tree for C HM06 
for p G C do 

Compute nn(|}, P) and store it with p 



Figure 1: Preprocessing the subspace Ai to answer (1 + e)-ANN queries on P. Here C2 is a 
sufficiently large constant. 



algANN (qeM) 
p ^ 2-ANN of q among C 
(Use net-tree Mc |HM06j to compute p.) 
return the point in P associated with p. 



Figure 2: Find a (1 + 0(£))-ANN in P for a query point q G M. 



5.1 The construction. 

For a set Z' C P' let 



:z') 



max h. 

{p,h)(iZ' 



The preprocessing stage is presented in Figure [l| and the algorithm for finding the (1+e)- 
ANN for a given query is presented in Figure |2] 

5.2 Analysis. 

Suppose the data-structure returned y and the actual nearest neighbor of q is Hq. If y = Hq 
then the algorithm returned the exact nearest-neighbor to q and we are done. Otherwise, 
by our general position assumption, we can assume that y' ^ n^. 

Note, there is a WSPD pair {A',B'} G W that separates y' from n' in A1'; namely, 
y' G A' and n^^ G B' . 

Let t = dj^'i^', b'), where a' and b' are the representative points of A' and B', respectively. 
Now, let T = h„,ax(^') + hmax(5') +t, R= C2T/e and r = eT/c2. 
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Lemma 5.1 If q ^ ba\\{a{a), R) U ball(a(6), i?) then the algorithm returns (1 + €)-ANN in 
P to the query point q (assuming c^ is sufficient large.). 

Proof: Observe that d{a{nq),a{y)) < d_Mi (^n'^, y'^ < d_Mi{a',b') + diam(A') + diam(_B') < 
t(l + 1/8 + 1/8) = 5t/4 by the 8-WSPD separation property. So, by the triangle inequahty, 
we have dx{n^, y) < h(nq) + d{a{n^), a{y)) + h{y) < K..M') + (5/4)t + h^,^{B') < (5/4)T. 
Since n'^,b' e B', we have dx{a{n^),a{b)) < dM'{n'q,b') < diam(5') < t/8 < T/8. 
Therefore, 

dx{q,a{f\^)) > dxiq,a{b)) -dxia{nq),a{b)) 



> 


T 

e 


diam(fi' 


> 


C2T 




^ 


2e' 





assuming e < 1 and C2 > 1. Now, dx{q.,n^ > (i;f (nq, a(nq)), and thus by the triangle 
inequality, we have 

dx{q,n^j > 



> 
> 



2 

dx{q,a{n^)) 



2 

C2T 

Ae ■ 

This implies that dx{q,y) < dx{q, n^) + dxin^.y) < dx{q, Hq) + (5/4)T < (1 + e)dx{q, Hq), 
assuming C2 > 5. ■ 

Lemma 5.2 If q E ba\\{a{a) , C2T / e) U ball(Q;(6), c2T/£) then the algorithm returns (1 + e)- 
ANN in P to the query point q. 

Proof: Since the algorithm covered the set ha\\{a{a) , T / e) U ba\\{a{b),T/e) with a net of 
radius r = eT/c2, it follows that dx{q,C) < r. Let c be the point in the 2-ANN search to q 
in Ac- We have dx{q,c) < 2r. Now, the algorithm returned the nearest neighbor to c as the 
ANN; that is, y is the nearest neighbor of c in P. 
lidxiq,y) >T/40 then 

dx (q, Hq) > dx (c, Hq) - dx (q, c) 

> dxic,y)~dxiq,c) 

> idxiq,y)-dxiq,c))-dxiq,c) 

> dxiq,y)-Ar 

= dxiq,y)-4: — 

> il-s/2)dxiq,y), 
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by the triangle inequality and if C2 > 320. Since 1/(1 —e/2) < l + e, we have that dx{(\,y) < 
(1 +e)dx{q,nq). 

If dx{<i, Hq) > T/40 then using similar argumentation to the above, we have that 

dxici, y) < dx (c, y) + dx (q, c) 

< dx{c,y) + 2r 

< dx{c,n^ + 2r 

< c?;t(q,nq)+4r 

= d^flq, nq)+4 — 

< {l + e)dx{q,nq), 

assuming C2 > 160. 

If dxic[,nq) < T/40 and dxiq,y) < T/AO then h(nq) < dx{q,n^) < T/AO and h{y) < 
dx{q,y) < T/40. Observe that 

t 3T 
hmax(A') <Ky) + diam(A') < T/40 + ^ < ^. 

o zU 

and similarly hniax(-B') < 3T/20. This implies that 

m)t = t[i-l-l 

< dM'{o! ) b') — diam(A') — diam(i?') 

< dM'i^'^^y) 

= |h(nq) -h(y)| +dx{a{n^),a{y)) 

< T/40 + dxia{n^), n^) + dxin^, y) 
+ dxiy,a{y)) 

<T/40 + h(nq) 

+ {dxin^, q) + dx{q, y)) + h(y) 

< T/40 + 3T/20 + T/40 + T/40 + 3T/20 
<3T/8 

This implies that t < T/2 and thus T = t + \\^^^{A') + \\^^^{B') < T/2 + 3T/20 + 3T/20 = 
(4/5)T. This implies that T < 0. We conclude that dj^/{a',b') = t < T < 0. That implies 
that a' = b', which is impossible, as no two points of P get mapped to the same point in Ai'. 
(And of course, no point can appear in both sides of a pair in the WSPD.) ■ 

The preprocessing time of the above algorithm is dominated by the task of computing for 
each point of C its nearest neighbor in P. Observe, that the algorithm would work even if we 



only use (1 + 0(£:))-ANN. Using Theorem 4.2 to answer these queries, we get the following 
result. 
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Theorem 5.3 Given a set 0/ P C A" of n points, and a subspace M. of doubling dimension 
dim, one can construct a data structure requiring space ne~^'^'^^'^\ such that given a query 
point q G A^ one can find a (1 + e)-ANN to q in P. The query time is 2*^^'^™) logn. 
The preprocessing time to build this data- structure is ne~'^^'^^'^'logn. 

6 Online ANN 

The algorithms of Section |4] and Section [5] require that the subspace of the query points is 
known, in that we can compute the closest point a{p) on Ai given a p G A", and that we can 
find a net for a ball on Ai using compNet, see Section |2} In this section we show that if 
we are able to efficiently answer membership queries in regions that are the difference of two 
balls, then, in fact, we do not require such explicit knowledge of Ai. We construct an AVD 
on Ai in an online manner as the query points arrive. When a new query point arrives, we 
test for membership among the existing regions of the AVD. If a region contains the point 
we immediately output its associated ANN that is already stored with the region. Otherwise 
we use an appropriate algorithm to find a nearest neighbor for the query point and add a 
new region to the AVD. 

Here we present our algorithm to compute the AVD in this online setting and prove that 
when the query points come from a subspace of low doubling dimension, the number of 
regions created is linear. 

6.1 Online AVD Construction and ANN Queries. 

The algorithm algBuildAVD(P, 3^, q) is presented in Figure p] The algorithm maintains a 
set of regions Ji that represent the partially constructed AVD. Given a query point q it returns 
an ANN from P and sometimes adds a region TZq to Ji. The quantity D' is a 2- approximation 
to the diameter D of P, and can be precomputed in 0{n) time. Let p be a fixed point of P. 

The regions created by the algorithm in Figure [3] are the difference of two balls. An 
example region when the balls ball(q, £r2/5) and ha\\{y, At/be) intersect is shown in Figure [4J 
The intuition as to why y is a valid ANN inside this region is as follows. Since the distance 
of q to y is ri, the points inside ba\\{y,eri/3) are all roughly the same distance from q. 
The next distance of interest is the closest point outside this ball. As long as we are inside 
ball(q, er2/5) the points outside ba\\{y,eri/3) are too far and cannot be a (1 + £)-ANN. But 
if we get too close to y we can no more be certain that y is a valid (1 + £:)-ANN, as it is no 
more true that distances to points inside ha\\{y,eri/3) look all roughly the same. In other 
words, there may be points much closer than y, when we are close enough to y. Thus in a 
small enough neighborhood around y we need to zoom in and possibly create a new region. 
The formal proof of correctness follows from the following lemmas. 

Lemma 6.1 Ifd{q,p) > 2D' + 2D7e then p is a valid (1 + e)-ANN. 
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algBuildAVD(P,ai,q). 

comment: p is a fixed point in P . D' is a 2-approximation to diam(P) 
if (i(q,p) > 2D' + 2D7e return p. 
if 37^ G 3^ with q G 7^ 

return point y associated with TZ. 
Compute (1 + e/10)-ANN y of q in P. Let n = rf(q, y). 
Let s G P be the furthest point from y inside ball(y,eri/3). Let t = d{y,s). 
if there is no point in P \ ba\\{y,eri/3). 

Let 7^q = ball(q,DV4). 
else 

Compute (1 + £:/10)-ANN y of q in P \ ha\\{y, eri/3). Let rg = d{q, y). 

7^q = ball(q,£r2/5) \ ball(y, 5t/4e). 
3? = 3? U T^q. Associate y with TZ^. 
return y as ANN for q. 

Figure 3: Answering (1 + £)-ANN and constructing AVD 




Figure 4: An example AVD region IZc 
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Proof: Since D' is a 2-approximation to the diameter of P, so D < 2D'. This means d{q,p) > 
D + D/e. Let nq G P be the closest point to q. By the triangle inequality, 

D + D/e < d{q,p) < rf(q, nq) + d{n^,p) < d{q, n^) + D. 

This together with d{q,p) < d{q, Hq) + D implies that d{q,p) < (1 + £)d{q, Hq). ■ 

Lemma 6.2 If there is no region TZ containing q the algorithm outputs a valid {l + 6)-ANN. 

Proof: We output y which is a (1 + e/10)-ANN of q. ■ 

The next lemma finally completes the argument. 

Lemma 6.3 The (1 + e/10)-ANN y found in the algorithm is an (1 + e)-ANN for any point 
q G TZq constructed in the algorithm. 

Proof: There are two possibilities. 

(A) If the region TZ^ is the ball ball(q, D'/4) constructed when there is no point in P \ 
ba\\{y,eri/3), then it must be the case that D < 2eri/3 and so 

rf(q, P) > ri/(l + e/10) > ^^:p^ > D/e. 

It is not hard to see that in this case, y is a. valid (1 + £:)-ANN for any point inside 
ball(q,D74) C ball(q,D/4). 

(B) If the set P \ ball(y, £:ri/3) is nonempty then, as in Figure [s] let y be a (1 + £:/10)-ANN 
of q in P \ ball(y, eri/3) and let r2 = d{q,y). We divide the analysis into two cases. 

(i) If f2 < 2ri, let q be a new query point in 7?.q and let m G P be its nearest neighbor. 
li u = y there is nothing to show. Otherwise, by the triangle inequality we have 

d{q,u) > d{q,u) - er2/5 

> d{q,y)/{l + e/10)-e2r,/5 

> il-e/2)r,. 

Again by the triangle inequality we have, 

d{q, y) < d{q, y) + 2erjb = (1 + 2e/^)ri. 
Clearly we have (i(q, y) < (1 + s)d{q, u) for e < 1/5 and we are done. 



(ii) If r2 > 2ri then following the notation in Figure p^ we let s be the furthest point 
from y inside ball(?/,£:ri/3) and let t = d{y, s). Let q be a new query point and as 
before let m G P be its nearest neighbor. We claim that the nearest neighbor of q 
in P lies in ha\\{y,t). To see this, let z be any point in P\ha\\{y,t). Noting that the 
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distance from q to the closest point in P outside ha\\{y,t) is at least ^2/(1 + s/10) 
and by triangle inequality, 

d{q, z) > d(q, z) - er^jh 

> (l-3e/10)r2. 
On the other hand, we have 

diq, y) < (i(q, y) + erg/S < r2/2 + CT2/5. 

and so clearly any point in P \ ball(t/, t) cannot be the nearest neighbor of q for 

e <l. Now, 

d(q,y)<rf(q,M)+t. (1) 

Now q G ball(q,£:r2/5) \ ball(|/, 5t/4£). We have, 

rf(q, y) > 5t/Ae. 

Then, 

d{q,u)>d{q,y)-t> (^ " l) ^- (2) 

Therefore from Equation [T] and Equation |2| we have 

c?(q, y) < --, 4^c?(q, m) < (1 + e)rf(q, u) . 

for e < 1/4. 



6.2 Bounding the number of regions created. 

The online algorithm presented in Figure |3] is valid for any general metric space X, with- 
out any restriction on the subspace of query points. However, when the query points are 
restricted to a subspace of low doubling dimension dim then one can show that at most 
fi£-o{dim) j-ggiQ^g Q^j-e created. There are two types of regions created. Regions of the first 
type are created when P \ ha\\{y,eri/3) is empty and regions of the second type are cre- 
ated when this condition does not hold. An example region of the second type is shown in 
Figure |4J First we show that there are at most ^-^(dim) ^-ggiQ^g created which are of the first 
type. 

Lemma 6.4 There are at most ^-^(dim) ^.^gj^Q^^g ball(q, D'/4) created by query points for 
which P \ ha\\{y,eri/3) is empty. 
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Proof: Clearly any two such query points occur at a distance of at least D'/4 from each 
other. However all of them occur inside a ball of radius 2D' + 2D' /e around p. Thus their 
spread is bounded by 16(1 + 1/e) and so we can have at most £:^<^(dim) such points. ■ 

We now consider the regions of the second type created by the algorithm. Consider 
the mapped point set P' in the space Ad' . For this we know that there is a c-WSPD 
{{A'^,B'{\ , . . . .{A'g^B'g}} where c is a constant to be specified later and s = c^i^^^)n is 
the number of pairs. If a query point q creates a new region of the second type we shall 
assign it to the set Q^ ^ if the pair of points y', y' of the algorithm satisfy y' e A^ and y' E B'^. 
We assign it to the set Q-g if y' E B'- and y' e A'-. For a pair {A'^, B'^} of the WSPD we 
define the numbers \^max{A'^) = max(„ /i)^^/ h. Similarly let hmax(-Bi) = ^^^^{z,h)eB'- h and 

\i = maXu>^A'^,z'eB'^ d{a{u), a{z)) . Let U = 1^ + hniax(^i) + hmax(5-). 

The following sequence of lemmas will then establish our claim. The basic strategy is to 
show that the set Q^ ^ has spread 0{l/e^). This holds analogously for Q^ 2 ^^^ so we will 
only work with Q^ ^. We will assume that c is a sufficiently large constant and e is sufficiently 
small. 

Lemma 6.5 diam(Q'.i) = OiU/e). 

Proof: Let q be a point in Q'^ ^. By assumption we have y' G A'^ and y' G B'^. By the triangle 
inequality, 

diy,y) < d{y,a{y)) + d{a{y),a{y)) + d{a{y),y) 

< hinax(^-) + I* + hinax(SO 

< u. 

On the other hand since the point y is outside ba\\{y,eri/3) so d{y,y) > eri/3. This gives 



us Ti < 3Li/e. By Lemma 2.1 dM'{<^,y') < QLj/e. Also, 



dM'{y',y') = d{a{y),a{y)) + \h{y)-h{y)\ 

< \i + hmax(^-) + hmax(50 < U 

Thus let u' be any other point in A'^ (this point could be a (1 + e/10)-ANN found for another 
query point in Qi,i). By the WSPD separation property we have dM'{y' iu') < \-i/c. Thus we 
have 

diam(Q^i) < 9U/e + U/c + QU/e 
= 0{U/e), 

for e small enough. ■ 

The next lemma tells us that r2 = d{q,y) is in fact Q (Lj). 

Lemma 6.6 The distances r2 and Lj satisfy r2 > Lj/18. 
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Proof: Let the point u' G A'- attain the maximum height. So h('u) = hjnax(^i)- From the 



proof of Lemma 6.5 



dM'iy ,u) < U/c. 

Applying the definition of the distance in Ai' this gives us, 

hmax(^:)-h(2/)<U/c, 

and so h(y) > hmax(^i) — Lj/c Similarly we have, h(|/) > hmax(-Bj') — U/c. We have 
ri = d{q,y) > h{y) and also r2 = <i(q, y) > h(|/). Noting that, r2 > ri/(l + 5/10) > j^Vi we 
get, 

^r2>h^,.(A0 + h^,.(i?0-— . (3) 

11 c 

Let z' G A^ and w' G -Bj' be such that d{a{z),a{w)) = Ij. We have dM'{y',z') < U/c and 

dM'{y',w') < \-i/c by the WSPD separation property. Noting that (i^/(q',y') < 3(i(q, ?/) < 

3ri and similarly (i_yK'(q',|/') < 3r2 we have by the triangle inequality. 



and similarly. 

By the triangle inequality. 



Thus we have. 



dM'{,o[,z) < 3ri + U/c, 
rfx/(q',w') < 3r2 + U/c. 

\i < dM'{z',w') 

< dM' {z, q) + dM' (q, w') 

2U 

< 3ri + 3r2 H 

c 

63 2U 

< — r2 + -^. 
- 10 ^ c 



63 , 2U 

T2>\^ '-. (4) 



10 c 

By Equation [3] and Equation |4] we have for c > 8, 

4U 

9r2 > h„iax(^i) + hmax(5i) + 1^ '- 

c 

> U-U/2 = U/2, 

which immediately implies our claim. i 

We now show that the points belonging to Qj^i are reasonably distant from each other. 
Lemma 6.7 Let q, q G Qi,i and q comes in after q. Then 

dM'{<\, q ) = c?(q, q) > er2/h. 
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Proof: By Lemma 2.1 we have (i_A4'(q', q') = d{q, q) and so we will only show that d{q, q) > 
er2/5. 

If q ^ ball(q, £r2/5) then we have nothing to prove. Otherwise, since q created a new 
region it must be the case that q G ball(y, 5t/4e). We show that this leads to a contradiction. 

The first observation is that we must have r2 > Irxje. To see this notice that since 
q G ball(q, £:r2/5) fl ball(y, 5t/4£:) it must be the case that, 

£:r2/5 + 5t/4£: > r^. 

But t < eri/3 and so 

er2/5 + 5ri/12 > ri, 

which gives r2 > 25ri/12e > Irxje. 

The next observation relates the distances Lj, ri, r2 and dM'{y',y')- It is easy to see that 
dM'iy'^y') < Lj. On the other hand, 

dM'iy',y') > dM'{^,y')-dM'{.<i,y') 

> r2 — 3ri 

> 2ri/e — 3ri 

> ri/e 



for e < 1/3. 

In terms of r2 we have 



dM'{y',y) > r2-3ri 

> r2 - 3er2 

> r2/2>U/36, 



by Lemma 6.6 and for e < 1/6. 

Now q lies inside ball(|/, bt/Ae) and as t < eri/3 we have d{y, q) < 5ri/12. 
Let z be an arbitrary point in Bi and notice that, 

dM' (q', z') > dM' iy', z') - dM' (q', y') 

> dM' (y, y) - dM' (y, z') - dM' (q , y') 

> U/36-U/c-5ri/4 

> U/36-U/c-^U 

> U/40, 

for sufficiently small e and sufficiently large c. This further implies that d{q, z) > Lj/120. 

Denote by Ci the set Ai U {s} where recall that s is the furthest point from y in 
ba\\{y,eri/3) and d{y,s) = t. Notice that it is possible that s ^ Ai. A subtle and tech- 
nical point is that we require t 7^ 0. This can be enforced by changing the definition of TZ^ 
to ball(q, er2/5) \ (ball(?/,4t/5£:) \ {y})- Even with this modification, the algorithm and the 
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results established so far are correct. However now t cannot be 0. To see this notice that 
by assumption q G {ha\\{y , 5t / Ae) \ {y})- If t = then this means the only point inside 
ball(?/, 5t/4e) is y. But q cannot he y. So t 7^ and s ^ y. Our next observation is that q is 
close to Ci. Let u & Ai. Then, 

dM'iq,u') < dM'{q,y') + dM'{y',u) 

< 5ri/4 + U/c 

< — U + U/c, 

which also means that, 

d{q,u) < —U + U/c. 

We also have that 

rf(q, s) < 5ri/12 + eri/3 < ri/2 < eU/2. 

And then we observe that we can choose c large enough and e small enough so that for 
any u & Ci and 2; G -Bj we have 

d{q, z) > 2d{q,u) . 

Notice that this implies trivially that BiCiCi = 0. 

Let w G Aihe the (1 + £:/10)-ANN found by the algorithm for q. Since d{q, y) < ht/Ae, it 
follows that (i(q, w) < ^(1 + e/lQ)t. Denote (i(q, w) by x. The next observation is that we 
must have Ci C hd\\{w,ex/?>). This is true because if it were not the case that Ci is entirely 
inside ball(w, ex/?>) then by the last observation, the (1 +£:/10)-ANN of q in P \ ball(w, ex/?,) 
would belong to Ci \ hdt\\{w,ex/?>)^ whereas by assumption it belongs to Bi which is disjoint 
from Ci. 

Then we have y^s & ball(w, ex/3) and so 

2e 5t 

t = d{y,s) < 2ed{q,w)/3 < - ■ (1 + e/10) ■ - < t 

for e sufficiently small. This is a contradiction because t > 0. This concludes the proof. ■ 

The following now follows easily. 
Lemma 6.8 We have that max(|Qi,i| , |Qi,2|) = e-^^'^''"\ 



Proof: From Lemma [6. 5[ Lemma [6. 6| and Lemma [6. 7| it follows that the spread of the set 
Q^ I is bounded by 



0(^)^0(1/.') 



Since Qj^i C J\A' which is a space of doubling dimension O (dim) it follows that |Qi,i| 
^-o(dim)_ rpj^g same argument works for Qj^2- 

The next lemma bounds the number of regions created. 
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Lemma 6.9 The number of regions created by the algorithm is n/e'~^^^^^\ 



Proof: As shown in Lemma 6.4 the number of regions of the first type is bounded by 5''^(dim)_ 
Consider a region T^q of the second type. For this point q the algorithm found a vahd y 
and y. Now from the definition of a WSPD there is some i such that y' G A'^, y' G -B- or 
y' G B'l, y' G v4'j. In other words there is some i such that q G Qj^i or q G Qj^2- As shown in 



Lemma 6.8 the size of each of these is bounded by e '^('i™). Since the total number of such 
sets is 2s where s = nc'-^^^^^^ is the number of pairs of the WSPD, it follows that the total 
number of regions created is bounded by (-) n which is just 77,5-o(dim) ^^^ ^ sufficiently 

small. ■ 

We summarize the results in this section. 

Theorem 6.10 The online algorithm presented in Figurel^ always returns a (l + e)-ANN. If 
the query points are constrained to a subspace of doubling dimension dim, then the maximum 
number of regions created for the online AVD by the algorithm is n/e'^^^^^\ 

7 Conclusions 

In this paper, we looked at the ANN problem when the data points can come from an arbitrary 
metric space (not necessarily an Euclidean space) but the query points are constrained to 
come from a subspace of low doubling dimension. We demonstrate that this problem is 
inherently low dimensional by providing fast ANN data-structures obtained by combining 
and extending ideas that were previously used to solve ANN for spaces with low doubling 
dimensions. 

Interestingly, one can extend Assouad's type embedding to an embedding that (1 + £:)- 
preserves distances from P to A^ (see |HM06j for an example of a similar embedding into 
the ioo norm). This extension requires some work and is not completely obvious. The target 
dimension is roughly l/5<^(<^™) in this case. If one restricts oneself to the case where both P 
and M. are in Euclidean space, then it seems one should be able to extend the embedding 
of [GKOQj to get a similar result, with the target dimension having only polynomial depen- 
dency on dim. However, computing either embeddings efficiently seems quite challenging. 
Furthermore, even if the embedded points are given, the target dimension in both cases is 
quite large, and yields results that are significantly weaker than the ones presented here. 

The on the fly construction of AVD without any knowledge of the query subspace (Sec- 
tion |6]) seems like a natural candidate for a practical algorithm for ANN. Such an implemen- 
tation would require an efficient way to perform point-location in the generated regions. We 
leave the problem of developing such a data-structure as an open question for further re- 
search. In particular, there might be a middle ground between our two ANN data-structures 
that yields an efficient and practical ANN data-structure while having very limited access to 
the query subspace. 
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